Change Image
Change Image
(Two
Challenging
Problems)
Evaluation
of
Statistical
and
Machine
Learning
Systems
Olivier
Binette
Duke
University
/
American
Institutes
for
Research
JSM
2022
Washington,
DC
August
10,
2022
olivierbinette.ca
Overview
2
olivierbinette
.ca
August 7, 2022
Two
challenging
evaluation
problems:
1.
the
reliability
of
multiple
systems
estimation
,
and
2.
the
accuracy
of
entity
resolution
algorithms.
Where
and
what
is
our
science
of
statistical
evaluation?
(it
often
seems
fragmented
or
neglected
in
favor
of
modeling)
Systematic
assessment
of
a
model's
performance
and
properties
for
the
purpose
of:
1.
choosing
the
best
model,
2.
using
models
appropriately,
and
3.
understanding
real-world
effects.
Evaluation
studies
need
to
answer
specific
questions
using
appropriate
methodology.
3
What
is
Evaluation?
August 7, 2022
olivierbinette.ca
1.
Reliability
of
Multiple
Systems
Estimation
4
August 7, 2022
olivierbinette.ca
How
many
victims
of
human
trafficking?
•
Victims
are
hidden
and
hard
to
reach.
•
Organizations
like
the
police
and
NGOs
only
reach
a
small
proportion
of
the
victims.
How
can
we
get
a
representative
picture?
•
5
The
Problem
August 7, 2022
olivierbinette.ca
How
it
works:
•
Integrate
data
(observed
victims)
from
multiple
sources
through
record
linkage.
•
Perform
a
missing
data
analysis
to
estimate
the
number
of
unobserved
victims.
6
Multiple
Systems
Estimation
(
MSE
)
police
...
NGO
...
Public
...
Integrated
data
...
Number
of
observed
victims
Number
of
unobserved
victims
olivierbinette.ca
August 7, 2022
Contentious
question.
•
200
years
of
controversy!
•
No
ground
truth
to
check
results.
It's
all
about:
•
Missing
data
assumptions
•
Data
sufficiency
and
robustness
•
Inductive
biases
7
Does
MSE
Work?
olivierbinette.ca
August 7, 2022
Drop
simulation
studies
that
can
give
any
result
you
like.
Instead:
1.
Perform
sensitivity
analyses.
2.
Dig
through
data
for
pseudo
ground
truths.
3.
Quantify
the
consequences
of
model
assumptions.
4.
Generate
visual
&
meaningful
assessments
of
robustness
.
5.
8
Our
Evaluation
Proposal
August 7, 2022
olivierbinette.ca
https://github.com/
OlivierBinette
/
MSETools
I
don't
think
we've
closed
the
discussion,
but
these
evaluation
tools
provide
significant
practical
instights
.
I
wish
I
had
known
more
about
the
science
of
evaluation
when
going
into
this
project.
•
I
feel
like
this
science
is
or
has
been
neglected
.
What
do
you
think?
9
Conclusion
Regarding
MSE
August 7, 2022
olivierbinette.ca
2.
Evaluation
of
Entity
Resolution
Algorithms
10
olivierbinette.ca
August 7, 2022
11
Inventor
Disambiguation
at
PatentsView
.org
olivierbinette.ca
August 7, 2022
Are
they
the
same?
Goal
:
•
Cluster
inventor
mentions
that
refer
to
the
same
real-world
person
.
Evaluation
metrics:
•
Precision
and
recall
Benchmark
datasets:
•
Hand-disambiguated
subsets
of
the
data
12
Inventor
Disambiguation
August 7, 2022
olivierbinette.ca
13
Evaluation
should
be
straightforward...
right!?
August 7, 2022
olivierbinette.ca
We
proposed
new
methodology
for
unbiased
performance
estimation
based
on
sampling
ground
truth
clusters
.
•
Re
presentative
performance
estimates
for
the
first
time
at
PatentsView
.org
•
More
cost-effective
and
practical
(for
PatentsView
)
than
sampling
record
pairs
or
other
approaches.
14
Evaluation
is
not
straightforward.
olivierbinette.ca
August 7, 2022
15
Usage
https://github.com/
PatentsView
/
PatentsView
-Evaluation
Leave
a
star!
August 7, 2022
olivierbinette.ca
Conclusion
16
August 7, 2022
olivierbinette.ca
•
Evaluation
is
often
not
straightfoward
.
•
It
is
often
neglected.
•
We
need
to
value
it
more
.
•
Where
is
our
science
of
evaluation?
Help
me
find
it!
I
want
to
hear
your
stories
and
thoughts.
olivier
.
binette
@duke.
edu
17
Concluding
Thoughts
olivierbinette.ca
August 7, 2022
18
Papers
olivierbinette.ca
August 7, 2022
Soon
on
arxiv.
A
vailable
on
my
website
arXiv
:
2112.01594
Funding:
•
American
Institutes
for
Research
•
NSERC
Canada
Graduate
Scholarship
•
NSF
CAREER
Award
(Rebecca
Steorts)
•
ASA
Travel
award
•
Github
sponsors
(individual
contributors)
•
G-Research
PhD
grant
19
Thank
you!
August 7, 2022
olivierbinette.ca
0
Fullscreen